Inferring Subcat Frames of Verbs in Urdu
نویسنده
چکیده
This paper describes an approach for inferring syntactic frames of verbs in Urdu from an untagged corpus. Urdu, like many other South Asian languages, is a free word order and case-rich language. Separable lexical units mark different constituents for case in phrases and clauses and are called case clitics. There is not always a one to one correspondence between case clitic form and case, and case and grammatical function in Urdu. Case clitics, therefore, can not serve as direct clues for extracting the syntactic frames of verbs. So a two-step approach has been implemented. In a first step, all case clitic combinations for a verb are extracted and the unreliable ones are filtered out by applying the inferential statistics. In a second step, the information of occurrences of case clitic forms in different combinations as a whole and on individual level is processed to infer all possible syntactic frames of the verb.
منابع مشابه
Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability
This paper describes Subcat-LMF, an ISOLMF compliant lexicon representation format featuring a uniform representation of subcategorization frames (SCFs) for the two languages English and German. Subcat-LMF is able to represent SCFs at a very fine-grained level. We utilized SubcatLMF to standardize lexicons with largescale SCF information: the English VerbNet and two German lexicons, i.e., a sub...
متن کاملIssues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform
In South-Asian languages such as Hindi and Urdu, action verbs having compound constructions and serial verbs constructions pose serious problems for natural language processing and other linguistic tasks. Urdu is an Indo-Aryan language spoken by 51, 500, 000 speakers in India. Action verbs that occur spontaneously in day-to-day communication are highly ambiguous in nature semantically and as a ...
متن کاملThe interaction of light verbs and verb classes of Urdu
The paper describes an attempt of identifying Urdu verb classes on the basis of the distribution of light verbs with different main verbs. We started with a frequency analysis of main + light verb sequences. The analysis of that data lead us to a thorough manual analysis of main + light verb sequences by using native speaker judgments. We focused on the three most frequent light verbs dE 'give'...
متن کاملIs Hypothesis Testing Useful for Subcategorization Acquisition?
Statistical ltering is often used to remove noise from automatically acquired subcat-egorization frames. In this paper, we compare three diierent approaches to ltering out spurious hypotheses. Two hypothesis tests perform poorly, compared to ltering frames on the basis of relative frequency. We discuss reasons for this and consider directions for future research.
متن کاملAcquisition of Subcategorization Frames from Large Scale Texts
Subcategorization frames are useful for many applications. Due to many ambiguities, to extract them is not straightforward. In this paper, a probabilistic chunker is used to determine the plausible phrase boundaries and a finite state mechanism, SUBCAT-TRACTOR, is proposed to extract 23 subcategorization frames. In order to get rid of the problems introduced by compound nouns, a noun-phrase ext...
متن کامل